The effect of alternative tree representations on tree bank grammars

نویسنده

  • Mark Johnson
چکیده

The performance of PCFGs estimated from tree banks is sensitive to the particular way in which linguistic constructions are represented as trees in the tree bank. This paper presents a theoretical analysis of the effect of different tree representations for PP attachment on PCFG models, and introduces a new methodology for empirically examining such effects using tree transformations. It shows that one transformation, which copies the label of a parent node onto the labels of its children, can improve the performance of a PCFG model in terms of labelled precision and recall on held out data from 73% (precision) and 69% (recall) to 80% and 79% respectively. It also points out that if only maximum likelihood parses are of interest then many productions can be ignored, since they are subsumed by combinations of other productions in the grammar. In the Penn II tree bank grammar, almost 9% of productions are subsumed in this way.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

Alternating Regular Tree Grammars in the Framework of Lattice-Valued Logic

In this paper, two different ways of introducing alternation for lattice-valued (referred to as {L}valued)  regular tree grammars and {L}valued top-down tree automata are compared. One is the way which defines the alternating regular tree grammar, i.e., alternation is governed by the non-terminals of the grammar and the other is the way which combines state with alternation. The first way is ta...

متن کامل

Data-Driven Compilation of LFG Semantic Forms

In a recent paper (van Genabith et al., 1999) describe a semi-automatic method for annotating tree banks with high level Lexical Functional Grammar (LFG) f-structure representations. First, a CF-PSG is automatically induced from the tree bank using the method described in (Charniak, 1996). The CF-PSG is then manually annotated with functional schemata. The resulting LFG is then used to determin...

متن کامل

The e ect of alternative tree representations on tree bankgrammarsMark

grammars Mark Johnson Cognitive and Linguistic Sciences, Box 1978 Brown University Providence, RI 02912, USA Mark [email protected] Abstract The performance of PCFGs estimated from tree banks is shown to be sensitive to the particular way in which linguistic constructions are represented as trees in the tree bank. This paper presents a theoretical analysis of the e ect of di erent tree represen...

متن کامل

Tree-bank Grammars Tree-bank Grammars

By a \tree-bank grammar" we mean a context-free grammar created by reading the production rules directly from hand-parsed sentences in a tree bank. Common wisdom has it that such grammars do not perform well, though we know of no published data on the issue. The primary purpose of this paper is to show that the common wisdom is wrong. In particular we present results on a tree-bank grammar base...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007